Multilingual Computational Semantic Lexicons in Action: Wysinnwyg Approach to Nlp 1 a Cross-linguistic Investigation on Spatially-based Expressions

نویسنده

  • Evelyne Viegas
چکیده

The Abstract Much effort has been put into computational lexicons over tile years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on tile study and representation of lexical items to express the umterlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fails from a theoretical point of view to capture the core meaning of words, let, alone relate word meanings to one another, and complicates the task of NLP by multiplying ambiguities in analysis and choices in generation. In this paper, I study computational semantic lexicon representation from a multilingual point of view, recom:iling different approaches to lexicon representation: i) vagueness for lexemes which have a more or less finer grained semantics with respect to other languages; ii) underspecification for lexemes which have multiple related facets; and, iii) lexical rules to relate systematic polysemy to systematic ambiguity. I build on a What You See Is Not Necessarily What You Get (WYSINNWYG) approach to provide the NLP system with the "right" lexical data already tuned towards a particular task. In order to do so, I argue for a lexical semantic approach to lex~ icon representation. I exemplify my study through a cross-linguistic investigation on spatially-based expressions. In this paper, I argue for computational semantic lexicons as active knowledge sources in order to provide Natural Language Processing (NLP) systeins with the "right" lexical semantic representation to accomplish a particular task. In other words, lexicon entries are "pre-digested", via a lex-ieal processor, to best fit an NLP task. This What You See (in your lexicon) Is Not Necessarily What You Get (as input to your program) (WYSIN-NWYG) approach requires the adoption of a symbolic paradigm. Formally, I use a combination of three different approaches to lexicon repre, sen-tations: (1) lexico-semantic vagueness, for lexemes which have a more or less fner graine, d semantics with re, spect to other languages (for instance cn in Spanish is vague between the Contact and Container senses of the Location, whereas in English it is finer grained, with on for the former and in for the latter); (2) lexico-semantic underspecification, for lex-emes which have multiple related facets (such as for instance, door which is underspecified with respect to its Aperture or PhysicalObjeet meanings); and, (3) lcxical rules, to relate systematic polysemy to systematic ambiguity (such as …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Computational Semantic Lexicons in Action: The WYSINNWYG Approach to NLP

Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fai...

متن کامل

Boosting Lexical Resources for the Semantic Web: Generative Lexicon and Lexicon Interoperability

Computational lexicons can play a key role in the Semantic Web: aiming at making word content machine-understandable, they intend to provide an explicit representation of word meaning, so that it can be directly accessed and used by computational agents, such as large-coverage parsers, modules for intelligent Information Retrieval or Information Extraction. In all these cases, semantic informat...

متن کامل

Standards & best practice for multilingual computational lexicons: ISLE MILE and more

ISLE (International Standards for Language Engineering) is a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme within the EU-US International Research Co-operation. It is a continuation of the European EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out through a number of subsequent projects funded by the Europ...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Development of the Multilingual Semantic Annotation System

This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998